Bottom Up Experimentation

This notebook documents the bottom-up strategy experimentation to determine notebook similarity. It is based on the notion that it is easier to aggregate than to break down a 'black box.'

The biggest challenge is working with the AST structure. Because it is a tree, we need to merge leafs with their parents, working our way up.

GOAL

There are two main goals:

  1. Come up with a similarity function for entire notebooks
  2. Maximize the coverage of 'black boxes' while minimizing the number of 'black boxes'

In [1]:
# Necessary imports 
import os
import time
from nbminer.notebook_miner import NotebookMiner
from nbminer.cells.cells import Cell
from nbminer.features.ast_features import ASTFeatures
from nbminer.stats.summary import Summary
from nbminer.stats.multiple_summary import MultipleSummary
from nbminer.features.featurize.ast_graph.ast_graph import *

In [2]:
people = os.listdir('../testbed/Final')
notebooks = []
for person in people:
    person = os.path.join('../testbed/Final', person)
    if os.path.isdir(person):
        direc = os.listdir(person)
        notebooks.extend([os.path.join(person, filename) for filename in direc if filename.endswith('.ipynb')])
notebook_objs = [NotebookMiner(file) for file in notebooks]
a = ASTFeatures(notebook_objs)

In [3]:
for i, nb in enumerate(a.nb_features):
    a.nb_features[i] = nb.get_new_notebook()

In [ ]:


In [4]:
graphs = []
for nb in a.nb_features:
    for cell in nb.get_all_cells():
        graphs.append(cell.get_feature('graph'))
agr = ASTGraphReducer(graphs)
num_nodes = []
for g in agr.graphs:
    num_nodes.append(g.graph_nodes())
print ('Total number of graphs:',agr.number_graphs())
print ('Total number of graphs with one node:',agr.number_single())
print ('Total number of nodes:',agr.count_nodes())
print (agr.count_nodes())
%matplotlib inline
import matplotlib.pyplot as plt
plt.hist(num_nodes, bins=30)


Total number of graphs: 19882
Total number of graphs with one node: 0
Total number of nodes: 289657
289657
Out[4]:
(array([  1.47890000e+04,   3.96700000e+03,   4.90000000e+02,
          1.66000000e+02,   1.07000000e+02,   8.30000000e+01,
          7.40000000e+01,   5.70000000e+01,   3.00000000e+01,
          1.90000000e+01,   1.90000000e+01,   9.00000000e+00,
          1.60000000e+01,   1.40000000e+01,   8.00000000e+00,
          6.00000000e+00,   7.00000000e+00,   4.00000000e+00,
          3.00000000e+00,   1.00000000e+00,   4.00000000e+00,
          1.00000000e+00,   1.00000000e+00,   3.00000000e+00,
          0.00000000e+00,   1.00000000e+00,   0.00000000e+00,
          1.00000000e+00,   0.00000000e+00,   2.00000000e+00]),
 array([   2.        ,   16.83333333,   31.66666667,   46.5       ,
          61.33333333,   76.16666667,   91.        ,  105.83333333,
         120.66666667,  135.5       ,  150.33333333,  165.16666667,
         180.        ,  194.83333333,  209.66666667,  224.5       ,
         239.33333333,  254.16666667,  269.        ,  283.83333333,
         298.66666667,  313.5       ,  328.33333333,  343.16666667,
         358.        ,  372.83333333,  387.66666667,  402.5       ,
         417.33333333,  432.16666667,  447.        ]),
 <a list of 30 Patch objects>)

In [5]:
cur_count = 0
new_count = 1
print (agr.count_nodes())
while cur_count != new_count:
    cur_count = new_count
    new_count = (agr.count_nodes())
    agr.build_relations()
print (new_count)


289657
37853

In [6]:
num_nodes = []
for g in agr.graphs:
    num_nodes.append(g.graph_nodes())
print ('Total number of graphs:',agr.number_graphs())
print ('Total number of graphs with one node:',agr.number_single())
print ('Total number of nodes:',agr.count_nodes())
print (agr.count_nodes())
%matplotlib inline
import matplotlib.pyplot as plt
plt.hist(num_nodes, bins=30)


Total number of graphs: 19882
Total number of graphs with one node: 17889
Total number of nodes: 37853
37853
Out[6]:
(array([  1.83560000e+04,   7.54000000e+02,   3.28000000e+02,
          1.53000000e+02,   9.80000000e+01,   6.90000000e+01,
          3.60000000e+01,   2.40000000e+01,   1.50000000e+01,
          7.00000000e+00,   1.10000000e+01,   5.00000000e+00,
          9.00000000e+00,   2.00000000e+00,   0.00000000e+00,
          3.00000000e+00,   2.00000000e+00,   0.00000000e+00,
          2.00000000e+00,   2.00000000e+00,   1.00000000e+00,
          2.00000000e+00,   0.00000000e+00,   0.00000000e+00,
          1.00000000e+00,   0.00000000e+00,   1.00000000e+00,
          0.00000000e+00,   0.00000000e+00,   1.00000000e+00]),
 array([   1. ,    4.9,    8.8,   12.7,   16.6,   20.5,   24.4,   28.3,
          32.2,   36.1,   40. ,   43.9,   47.8,   51.7,   55.6,   59.5,
          63.4,   67.3,   71.2,   75.1,   79. ,   82.9,   86.8,   90.7,
          94.6,   98.5,  102.4,  106.3,  110.2,  114.1,  118. ]),
 <a list of 30 Patch objects>)

In [7]:
# Similarity between nb 0 and all other notebooks:
print (sorted([similarity[1][1] for similarity in a.notebook_jaccard_similarity(0)]))


[0.05263157894736842, 0.06153846153846154, 0.07042253521126761, 0.07246376811594203, 0.07575757575757576, 0.07792207792207792, 0.07936507936507936, 0.08450704225352113, 0.08695652173913043, 0.08695652173913043, 0.08771929824561403, 0.09090909090909091, 0.09259259259259259, 0.09333333333333334, 0.0945945945945946, 0.0958904109589041, 0.0975609756097561, 0.10227272727272728, 0.10256410256410256, 0.10294117647058823, 0.10588235294117647, 0.10606060606060606, 0.10714285714285714, 0.1076923076923077, 0.1095890410958904, 0.1095890410958904, 0.11, 0.1111111111111111, 0.1111111111111111, 0.1111111111111111, 0.1111111111111111, 0.11235955056179775, 0.1125, 0.11290322580645161, 0.11290322580645161, 0.11428571428571428, 0.11494252873563218, 0.11538461538461539, 0.11538461538461539, 0.11538461538461539, 0.11594202898550725, 0.11666666666666667, 0.11688311688311688, 0.11764705882352941, 0.11764705882352941, 0.11842105263157894, 0.11864406779661017, 0.12048192771084337, 0.1206896551724138, 0.1206896551724138, 0.12280701754385964, 0.1232876712328767, 0.1232876712328767, 0.1232876712328767, 0.12345679012345678, 0.125, 0.125, 0.12643678160919541, 0.1267605633802817, 0.1267605633802817, 0.1267605633802817, 0.12698412698412698, 0.12698412698412698, 0.12857142857142856, 0.13114754098360656, 0.13157894736842105, 0.13253012048192772, 0.13333333333333333, 0.13432835820895522, 0.13432835820895522, 0.13513513513513514, 0.13559322033898305, 0.13559322033898305, 0.13559322033898305, 0.13559322033898305, 0.13636363636363635, 0.13636363636363635, 0.13636363636363635, 0.13793103448275862, 0.13793103448275862, 0.13846153846153847, 0.1388888888888889, 0.1388888888888889, 0.14102564102564102, 0.1411764705882353, 0.1411764705882353, 0.1411764705882353, 0.14285714285714285, 0.14285714285714285, 0.14285714285714285, 0.14285714285714285, 0.14492753623188406, 0.14492753623188406, 0.14492753623188406, 0.14516129032258066, 0.14516129032258066, 0.14634146341463414, 0.14705882352941177, 0.14814814814814814, 0.14864864864864866, 0.14925373134328357, 0.14925373134328357, 0.15, 0.1506849315068493, 0.1506849315068493, 0.15151515151515152, 0.15151515151515152, 0.1518987341772152, 0.15384615384615385, 0.15384615384615385, 0.15384615384615385, 0.15517241379310345, 0.15555555555555556, 0.1568627450980392, 0.1568627450980392, 0.15714285714285714, 0.15714285714285714, 0.15789473684210525, 0.15789473684210525, 0.15789473684210525, 0.15789473684210525, 0.15873015873015872, 0.15873015873015872, 0.15942028985507245, 0.16, 0.16129032258064516, 0.16129032258064516, 0.16176470588235295, 0.16176470588235295, 0.1643835616438356, 0.1643835616438356, 0.1643835616438356, 0.16666666666666666, 0.16666666666666666, 0.16666666666666666, 0.16666666666666666, 0.16666666666666666, 0.16901408450704225, 0.16901408450704225, 0.16901408450704225, 0.16923076923076924, 0.16981132075471697, 0.1724137931034483, 0.17391304347826086, 0.17391304347826086, 0.17543859649122806, 0.17567567567567569, 0.17647058823529413, 0.17647058823529413, 0.17647058823529413, 0.17647058823529413, 0.1780821917808219, 0.17857142857142858, 0.18032786885245902, 0.18032786885245902, 0.18072289156626506, 0.18181818181818182, 0.18181818181818182, 0.18181818181818182, 0.1896551724137931, 0.19047619047619047, 0.19047619047619047, 0.1917808219178082, 0.1935483870967742, 0.1935483870967742, 0.19672131147540983, 0.2, 0.2, 0.2, 0.2028985507246377, 0.203125, 0.21428571428571427, 0.21794871794871795, 0.2222222222222222, 0.22413793103448276, 0.23809523809523808]

In [8]:
# Maximum similarity
all_sims = []
max_sim = 0
max_val = None
for i in range(len(a.nb_features)):
    for similarity in a.notebook_jaccard_similarity(i):
        if similarity[1][1] > max_sim:
            max_sim = similarity[1][1]
            max_val = (i, similarity[0])
max_sim, max_val


Out[8]:
(0.3404255319148936, (25, 39))

In [9]:
a.nb_features[2].notebook.filename


Out[9]:
'../testbed/Final/Aimee Montero aimee.montero@epfl.ch/FinalAimeeMontero.ipynb'

In [10]:
a.nb_features[3].notebook.filename


Out[10]:
'../testbed/Final/Akhilesh Gotmare akhilesh.gotmare@epfl.ch/ADA_Final_272620.ipynb'

Black Boxes

Now we're interested in what happened with this bottom up approach. What does the final thing look like? We can print out each graph and get a sense of what's happened, then we can look at some actual code, what it looks like in graph format, and what the black boxes it holds actually mean


In [21]:
for cell in a.nb_features[25].get_all_cells():
    print (cell.get_feature('graph').get_nodes())


['black_box1917']
['black_box_combo5']
['black_box1']
['black_box1']
['black_box1']
['black_box_combo66']
['black_box1102']
['black_box1102']
['black_box_combo16']
['black_box1285']
["<class '_ast.Expr'>", 'black_box4246']
['black_box3825']
['black_box3825']
['black_box1286']
["<class '_ast.Expr'>", 'black_box43']
['black_box1288']
['black_box1318']
['black_box1288']
['black_box1288']
['black_box1288']
['black_box492']
['black_box1169']
['black_box492']
['black_box1169']
['black_box492']
['black_box492']
['black_box1169']
['black_box492']
['black_box1169']
['black_box492']
['black_box_combo15']
['black_box1921']
['black_box1921']
['black_box1921']
['black_box1921']
['black_box1103']
['black_box2482']
['black_box1286']
['black_box_combo20']
["<class '_ast.Assign'>", "<class '_ast.Call'>", "<class '_ast.Num'>", 'black_box1428', 'black_box4']
['black_box91']
['black_box1286']
['black_box1286']
['black_box_combo20']
['black_box2482']
['black_box1103']
["<class '_ast.Assign'>", "<class '_ast.Call'>", "<class '_ast.Num'>", 'black_box1428', 'black_box4']
['black_box91']
['black_box1286']
['black_box1286']
['black_box_combo13']
['black_box2']
['black_box2']
['black_box2']
['black_box1']
['black_box518']
['black_box1134']
['black_box1103']
['black_box2482']
['black_box492']
['black_box1330']
['black_box2432']
['black_box2432']
["<class '_ast.Assign'>", "<class '_ast.Attribute'>", "<class '_ast.Call'>", "<class '_ast.Attribute'>", "<class '_ast.Subscript'>", "<class '_ast.ExtSlice'>", "<class '_ast.Slice'>", "<class '_ast.Slice'>", "<class '_ast.Load'>", 'black_box74', "<class '_ast.Load'>", 'black_box112', "<class '_ast.Load'>", 'black_box4']
["<class '_ast.Assign'>", 'black_box4', 'black_box297']
['black_box1291']
['black_box1291']
['black_box91']
['black_box513']
['black_box1291']
['black_box_combo93']
['black_box1924']
['black_box513']
['black_box1291']
['black_box_combo93']
['black_box1924']
['black_box2']
['black_box_combo175']
['black_box513']
['black_box1291']
['black_box_combo93']
['black_box1110']
['black_box494']
['black_box1924']
['black_box494']
['black_box1924']
['black_box2']
['black_box_combo21']
['black_box514']
['black_box495']

In [22]:
for cell in a.nb_features[39].get_all_cells():
    print (cell.get_feature('graph').get_nodes())


['black_box1917']
['black_box1']
['black_box1']
['black_box1']
['black_box1']
['black_box1']
['black_box_combo30']
['black_box1288']
['black_box2']
['black_box2']
['black_box2']
['black_box1102']
['black_box1102']
['black_box3825']
['black_box3825']
['black_box1291']
['black_box_combo145']
['black_box492']
['black_box492']
['black_box492']
['black_box492']
['black_box492']
['black_box1285']
['black_box492']
['black_box_combo15']
['black_box518']
['black_box518']
['black_box3155']
['black_box3155']
['black_box1103']
['black_box91']
['black_box_combo25']
['black_box91']
['black_box4307']
['black_box4307']
['black_box4307']
['black_box4307']
['black_box4307']
['black_box4307']
['black_box91']
['black_box492']
['black_box_combo197']
['black_box1341']
['black_box1288']
['black_box1288']
['black_box_combo198']
['black_box492']
['black_box_combo197']
['black_box1341']
['black_box1288']
['black_box1288']
['black_box_combo198']
['black_box492']
['black_box_combo197']
['black_box1341']
['black_box1288']
['black_box1288']
['black_box_combo198']
['black_box492']
['black_box_combo197']
['black_box1341']
['black_box1288']
['black_box1288']
['black_box_combo198']
['black_box492']
['black_box_combo197']
['black_box1341']
['black_box1288']
['black_box1288']
['black_box_combo198']
['black_box492']
['black_box_combo197']
['black_box1341']
['black_box1288']
['black_box1288']
['black_box_combo198']
['black_box518']
['black_box513']
['black_box_combo17']
['black_box1959']
["<class '_ast.Expr'>", "<class '_ast.Call'>", 'black_box16', 'black_box74', 'black_box1668']
['black_box518']
['black_box518']
['black_box1110']
['black_box123']
['black_box1102']
["<class '_ast.For'>", "<class '_ast.Assign'>", 'black_box4', 'black_box2053', 'black_box4', 'black_box3', 'black_box513', 'black_box1110', 'black_box1291', 'black_box2444', 'black_box4305']
['black_box1110']
['black_box_combo7']
['black_box1']
['black_box1169']
["<class '_ast.For'>", 'black_box4', 'black_box53', 'black_box494', 'black_box494', 'black_box495', 'black_box1129', 'black_box1937']
["<class '_ast.Assign'>", "<class '_ast.Call'>", 'black_box565', 'black_box1434', 'black_box4']
['black_box124']
["<class '_ast.AugAssign'>", "<class '_ast.Add'>", 'black_box4', 'black_box3']
["<class '_ast.For'>", "<class '_ast.For'>", "<class '_ast.Assign'>", 'black_box4', 'black_box2053', 'black_box4', 'black_box74', 'black_box513', 'black_box1110', 'black_box1291', 'black_box2444', 'black_box4305', 'black_box4', 'black_box53', 'black_box123', 'black_box494', 'black_box494', 'black_box495', 'black_box1110', 'black_box1126', 'black_box2444', 'black_box1937', 'black_box4308']

In [13]:
cells = []
for nb in a.nb_features:
    cells.extend([cell for cell in nb.get_all_cells()])
groups = []
cur_code = ''
cur_group = []
for cell in cells:
    if cell.get_feature('original_code') == cur_code:
        cur_group.append(cell)
    else:
        if len(cur_group) > 0:
            groups.append(cur_group)
        cur_group = []
    cur_code = cell.get_feature('original_code')

In [14]:
group = 6
print ('*'*50)
print ('Black Boxes')
for cell in groups[group]:
    print (cell.get_feature('graph').get_nodes())
print ('*'*50)
print ('Code')
print (groups[group][0].get_feature('original_code'))
print ('*'*50)
print ('Black Box meaning')
for cell in groups[group]:
    n = (cell.get_feature('graph').get_nodes())
    if len(n) == 1 and n[0][:5] == 'black':
        print (agr.get_trace(n[0]))


**************************************************
Black Boxes
['black_box2428']
['black_box2428']
['black_box2428']
['black_box1286']
["<class '_ast.Assign'>", "<class '_ast.Call'>", "<class '_ast.keyword'>", "<class '_ast.List'>", "<class '_ast.List'>", "<class '_ast.Str'>", "<class '_ast.Load'>", 'black_box95', "<class '_ast.List'>", "<class '_ast.Str'>", "<class '_ast.Load'>", 'black_box95', "<class '_ast.Load'>", 'black_box74', 'black_box112', 'black_box112', 'black_box4']
['black_box1287']
["<class '_ast.Expr'>", "<class '_ast.Call'>", 'black_box3', 'black_box95']
**************************************************
Code

# coding: utf-8

# In[ ]:

# First let's count number of overall favorite and retweets
favorites_epfl = df_epfl['retweet_count'].sum()
retweet_epfl = df_epfl['favorite_count'].sum()

favorites_eth = df_eth['retweet_count'].sum()
retweet_eth = df_eth['favorite_count'].sum()

#Let's plot this
plt.figure();
df_show = pd.DataFrame(data=[['EPFL', favorites_epfl+retweet_epfl], ['ETH', favorites_eth + retweet_eth]], columns=['Uni', 'Number of Tweet + Likes'], index=['EPFL', 'ETH'])
df_show.plot(kind='bar')

print(favorites_epfl + retweet_epfl)

#It appears EPFL is more present in the twitter game

#We could also have used a pie chart to show this data effectively


**************************************************
Black Box meaning
<class '_ast.Assign'> (black_box2428)
	 <class '_ast.Name'> (black_box4)
		 <class '_ast.Store'>
	 <class '_ast.Call'> (black_box808)
		 <class '_ast.Attribute'> (black_box208)
			 <class '_ast.Load'>
			 <class '_ast.Subscript'> (black_box78)
				 <class '_ast.Load'>
				 <class '_ast.Name'> (black_box3)
					 <class '_ast.Load'>
				 <class '_ast.Index'> (black_box5)
					 <class '_ast.Str'>
<class '_ast.Assign'> (black_box2428)
	 <class '_ast.Name'> (black_box4)
		 <class '_ast.Store'>
	 <class '_ast.Call'> (black_box808)
		 <class '_ast.Attribute'> (black_box208)
			 <class '_ast.Load'>
			 <class '_ast.Subscript'> (black_box78)
				 <class '_ast.Load'>
				 <class '_ast.Name'> (black_box3)
					 <class '_ast.Load'>
				 <class '_ast.Index'> (black_box5)
					 <class '_ast.Str'>
<class '_ast.Assign'> (black_box2428)
	 <class '_ast.Name'> (black_box4)
		 <class '_ast.Store'>
	 <class '_ast.Call'> (black_box808)
		 <class '_ast.Attribute'> (black_box208)
			 <class '_ast.Load'>
			 <class '_ast.Subscript'> (black_box78)
				 <class '_ast.Load'>
				 <class '_ast.Name'> (black_box3)
					 <class '_ast.Load'>
				 <class '_ast.Index'> (black_box5)
					 <class '_ast.Str'>
<class '_ast.Expr'> (black_box1286)
	 <class '_ast.Call'> (black_box224)
		 <class '_ast.Attribute'> (black_box74)
			 <class '_ast.Load'>
			 <class '_ast.Name'> (black_box3)
				 <class '_ast.Load'>
<class '_ast.Expr'> (black_box1287)
	 <class '_ast.Call'> (black_box225)
		 <class '_ast.keyword'> (black_box15)
			 <class '_ast.Str'>
		 <class '_ast.Attribute'> (black_box74)
			 <class '_ast.Load'>
			 <class '_ast.Name'> (black_box3)
				 <class '_ast.Load'>

In [31]:
print (agr.get_trace('black_box1288'))


<class '_ast.Expr'> (black_box1288)
	 <class '_ast.Call'> (black_box222)
		 <class '_ast.Str'>
		 <class '_ast.Attribute'> (black_box74)
			 <class '_ast.Load'>
			 <class '_ast.Name'> (black_box3)
				 <class '_ast.Load'>

In [24]:
for key in agr.names.keys():
    if 'Call' in key:
        print (key)


<class '_ast.Call'>

Can we go further

Now that we have a bunch of (hopefully) single element top level nodes, we can combine like pairs.


In [17]:
graph_sets = []
for nb in a.nb_features:
    graph_set = []
    for cell in nb.get_all_cells():
        graph_set.append(cell.get_feature('graph'))
    graph_sets.append(graph_set)

In [18]:
agc = ASTGraphCombiner(graph_sets)

In [19]:
print ('before',agc.count_graphs())
agc.reduce_graphs()
print ('after',agc.count_graphs())
print ('total_distinct',agc.count_distinct_nodes())


before 19882
after 10604
total_distinct 3047

In [20]:
for graph in agc.graph_sets[0]:
    print (graph.get_nodes())


['black_box_combo26']
['black_box_combo22']
['black_box3822']
['black_box_combo86']
['black_box511']
['black_box493']
['black_box1103']
["<class '_ast.For'>", "<class '_ast.Assign'>", "<class '_ast.Call'>", "<class '_ast.List'>", "<class '_ast.Load'>", 'black_box2345', 'black_box74', 'black_box4', 'black_box4', 'black_box807']
['black_box1103']
["<class '_ast.For'>", "<class '_ast.Assign'>", "<class '_ast.Call'>", "<class '_ast.List'>", "<class '_ast.Load'>", 'black_box2345', 'black_box74', 'black_box4', 'black_box4', 'black_box807']
['black_box1103']
['black_box91']
['black_box2428']
['black_box1286']
["<class '_ast.Assign'>", "<class '_ast.Call'>", "<class '_ast.keyword'>", "<class '_ast.List'>", "<class '_ast.List'>", "<class '_ast.Str'>", "<class '_ast.Load'>", 'black_box95', "<class '_ast.List'>", "<class '_ast.Str'>", "<class '_ast.Load'>", 'black_box95', "<class '_ast.Load'>", 'black_box74', 'black_box112', 'black_box112', 'black_box4']
['black_box1287']
["<class '_ast.Expr'>", "<class '_ast.Call'>", 'black_box3', 'black_box95']
['black_box_combo169']
['black_box_combo105']
['black_box121']
['black_box1918']
['black_box1286']
['black_box_combo169']
['black_box_combo105']
['black_box121']
['black_box1918']
['black_box1286']
['black_box_combo169']
['black_box_combo105']
['black_box121']
['black_box1918']
['black_box1286']
['black_box122']
['black_box1103']
["<class '_ast.For'>", "<class '_ast.Assign'>", 'black_box4', 'black_box2345', "<class '_ast.If'>", "<class '_ast.Assign'>", 'black_box79', 'black_box227', "<class '_ast.Assign'>", 'black_box79', 'black_box228', 'black_box136', 'black_box1107', 'black_box4', 'black_box807']
['black_box121']
['black_box512']
['black_box_combo87']
['black_box1110']
['black_box1111']
['black_box513']
["<class '_ast.Expr'>", 'black_box2047']
["<class '_ast.Assign'>", 'black_box4', 'black_box2047']
["<class '_ast.Expr'>", "<class '_ast.Call'>", "<class '_ast.Str'>", 'black_box3', 'black_box2047']
['black_box2']
["<class '_ast.Assign'>", "<class '_ast.Call'>", "<class '_ast.Call'>", "<class '_ast.Call'>", 'black_box25', 'black_box209', 'black_box3', 'black_box3', 'black_box3', 'black_box4']
['black_box494']
['black_box121']
['black_box512']
['black_box_combo87']
['black_box1110']
['black_box1111']
['black_box513']
["<class '_ast.Expr'>", 'black_box2047']
["<class '_ast.Expr'>", "<class '_ast.Call'>", "<class '_ast.Str'>", 'black_box3', 'black_box2047']
['black_box2']
["<class '_ast.Assign'>", "<class '_ast.Call'>", "<class '_ast.Call'>", "<class '_ast.Call'>", 'black_box25', 'black_box209', 'black_box3', 'black_box3', 'black_box3', 'black_box4']
['black_box494']
['black_box121']
['black_box512']
['black_box1110']
['black_box1104']
['black_box2429']
['black_box1110']
['black_box1111']
['black_box_combo17']
['black_box1919']
['black_box2']
["<class '_ast.Assign'>", 'black_box4', 'black_box814']
['black_box494']

Coverage

What is the final 'coverage' of our method? The best way to represent the coverage is to look at the top level graphs. We had a total of 19,882 graphs, and we covered all of these graphs with a total of 3,047 unique node types.


In [ ]: